Model Selection

Multimodal Unified Representation

# Multimodal Unified Representation

Video-LLaVA is a multimodal model that unifies visual representations through pre-projection alignment learning, capable of handling visual reasoning tasks for both images and videos.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase